首页> 外文OA文献 >Uncertainty in action-value estimation affects both action choice and learning rate of the choice behaviors of rats
【2h】

Uncertainty in action-value estimation affects both action choice and learning rate of the choice behaviors of rats

机译:行动价值估计的不确定性会影响行动选择和大鼠选择行为的学习率

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

The estimation of reward outcomes for action candidates is essential for decision making. In this study, we examined whether and how the uncertainty in reward outcome estimation affects the action choice and learning rate. We designed a choice task in which rats selected either the left-poking or right-poking hole and received a reward of a food pellet stochastically. The reward probabilities of the left and right holes were chosen from six settings (high, 100% vs. 66%; mid, 66% vs. 33%; low, 33% vs. 0% for the left vs. right holes, and the opposites) in every 20–549 trials. We used Bayesian Q-learning models to estimate the time course of the probability distribution of action values and tested if they better explain the behaviors of rats than standard Q-learning models that estimate only the mean of action values. Model comparison by cross-validation revealed that a Bayesian Q-learning model with an asymmetric update for reward and non-reward outcomes fit the choice time course of the rats best. In the action-choice equation of the Bayesian Q-learning model, the estimated coefficient for the variance of action value was positive, meaning that rats were uncertainty seeking. Further analysis of the Bayesian Q-learning model suggested that the uncertainty facilitated the effective learning rate. These results suggest that the rats consider uncertainty in action-value estimation and that they have an uncertainty-seeking action policy and uncertainty-dependent modulation of the effective learning rate.
机译:评估行动候选人的奖励结果对于决策至关重要。在这项研究中,我们检查了奖励结果估计中的不确定性是否以及如何影响动作选择和学习率。我们设计了一个选择任务,其中老鼠随机选择了左刺孔或右刺孔,并获得了食物颗粒的奖励。左,右孔的奖励概率是从六个设置中选择的(左,右孔的高,100%与66%;中,66%与33%;低,33%与0%,以及相反),每20–549次试验中。我们使用贝叶斯Q学习模型来估计动作值的概率分布的时程,并测试它们是否比仅估计动作值平均值的标准Q学习模型更好地解释了大鼠的行为。通过交叉验证进行的模型比较显示,对于奖励和非奖励结果具有不对称更新的贝叶斯Q学习模型最适合大鼠的选择时间过程。在贝叶斯Q学习模型的动作选择方程中,动作值方差的估计系数为正,这意味着老鼠是不确定性寻求者。对贝叶斯Q学习模型的进一步分析表明,不确定性促进了有效学习率。这些结果表明,大鼠在行动值估计中考虑了不确定性,并且它们具有寻求不确定性的行动策略和有效学习率的不确定性依赖性调制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号